Robust Regression on MapReduce
نویسندگان
چکیده
Although the MapReduce framework is now the de facto standard for analyzing massive data sets, many algorithms (in particular, many iterative algorithms popular in machine learning, optimization, and linear algebra) are hard to fit into MapReduce. Consider, e.g., the `p regression problem: given a matrix A ∈ Rm×n and a vector b ∈ R, find a vector x∗ ∈ R that minimizes f(x) = ‖Ax− b‖p. The widely-used `2 regression, i.e., linear least-squares, is known to be highly sensitive to outliers; and choosing p ∈ [1, 2) can help improve robustness. In this work, we propose an efficient algorithm for solving strongly over-determined (m n) robust `p regression problems to moderate precision on MapReduce. Our empirical results on data up to the terabyte scale demonstrate that our algorithm is a significant improvement over traditional iterative algorithms on MapReduce for `1 regression, even for a fairly small number of iterations. In addition, our proposed interior-point cutting-plane method can also be extended to solving more general convex problems on MapReduce.
منابع مشابه
Support vector regression model for BigData systems
Nowadays Big Data are becoming more and more important. Many sectors of our economy are now guided by data-driven decision processes. Big Data and business intelligence applications are facilitated by the MapReduce programming model while, at infrastructural layer, cloud computing provides flexible and cost effective solutions for allocating on demand large clusters. In such systems, capacity a...
متن کاملLarge - Scale Non - Linear Regression within the Mapreduce Framework
Large-scale Non-linear Regression within the MapReduce Framework By: Ahmed Khademzadeh Thesis Advisor: Philip Chan, Ph.D. Regression models have many applications in real world problems such as finance, epidemiology, environmental science, etc.. Big datasets are everywhere these days, and bigger datasets would help us to construct better models from the data. The issue with big datasets is that...
متن کاملA robust least squares fuzzy regression model based on kernel function
In this paper, a new approach is presented to fit arobust fuzzy regression model based on some fuzzy quantities. Inthis approach, we first introduce a new distance between two fuzzynumbers using the kernel function, and then, based on the leastsquares method, the parameters of fuzzy regression model isestimated. The proposed approach has a suitable performance to<b...
متن کاملParallel extreme learning machine for regression based on MapReduce
Regression is one of the most basic problems in data mining. For regression problem, extreme learning machine (ELM) can get better generalization performance at a much faster learning speed. However, the enlarging volume of datasets makes regression by ELM on very large scale datasets a challenging task. Through analyzing the mechanism of ELM algorithm, an efficient parallel ELM for regression ...
متن کاملROUTE: run-time robust reducer workload estimation for MapReduce
MapReduce has become a popular model for large-scale data processing in recent years. Many works on MapReduce scheduling (e.g., load balancing and deadline-aware scheduling) have emphasized the importance of predicting workload received by individual reducers. However, because the input characteristics and user-specified map function of a given job are unknown to the MapReduce framework before ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013